252

17

Genomics

nucleotides are fluorescently labelled dideoxynucleotides lacking the hydroxyl group

necessary for chain extension. Hybridization of the primer to the marker initiates

DNA polymerization templated by the unknown sequence. Whenever one of the

dideoxynucleotides is incorporated, extension of that chain is terminated. After the

system has been allowed to run for a time, such that all possible lengths may be

presumed to have been synthesized, the DNA is separated into single strands and

separated electrophoretically on a gel. The electrophoretogram (sometimes referred

to as an electropherogram) shows successive peaks differing in size by one nucleotide.

Since the dideoxynucleotides are labelled with a different fluorophore for each base,

the successive nucleotides in the unknown sequence can be read off by observing

the fluorescence of the consecutive peaks.

A useful approach for very long unknown sequences (such as whole genomes)

is to randomly fragment the entire genome (e.g., using ultrasound). The fragments,

each approximately two megabases long and sufficient to cover the genome fivefold

to tenfold, are cloned into a plasmid vector, 4 inserted into a bacterial genome and

multiplied. The extracted and purified DNA fragments are then sequenced as above.

The presence of overlaps allows the original sequence to be reconstructed. 5 This

method is usually called shotgun sequencing. 6 Of course, overlaps are not guar-

anteed, but gaps can be filled in principle by conventional sequencing. 7 The rival

method is called bacterial artificial chromosome (BAC) assembly, 8 in which large

fragments of DNA are cloned into a bacterial plasmid or other vector; the fragments

are then sequenced and combined into a single sequence. Being more precise and

producing a more contiguous sequence than the shotgun method, BAC assembly is

often used to assemble large genomes and can be used for the analysis of complex

genetic structures.

Every aspect of sequencing (reagents, procedures, separation methods, etc.) has,

of course, been subject to much development and improvement since its invention

(in Sanger’s original method, the dideoxynucleotides were radioactively labelled),

and there are now high-throughput automated methods in routine use.

Another popular technique is pyrosequencing, whereby one kind of nucleotide

only is added to the polymerizing complementary chain; if it is complementary

to the unknown sequence at the actual position, pyrophosphate is released upon

incorporation of the complementary nucleotide. Using some other reagents, this

is converted to ATP, which is then hydrolysed by the chemiluminescent enzyme

luciferin, yielding a brief pulse of detectable light. The technique is suitable for

automation. It is, however, practically limited to sequencing strands shorter than

about 150 base pairs.

New techniques are constantly being developed, with special interest being shown

in single-molecule sequencing, which would obviate the need for amplification of

4 In this context, “vector” is used in the sense of vehicle.

5 This is somewhat related to Kruskal’s multidimensional scaling (MD-SCAL or MDS) analysis.

6 Venter et al. (2001).

7 Unambiguously assembled nonoverlapping sequences are called “contigs”.

8 IHGSC (2001).